Ontology Population using Corpus Statistics

نویسندگان

Rogelio Nazar

Irene Renau

چکیده

This paper presents a combination of algorithms for automatic ontology building based mainly on lexical cooccurrence statistics. We populate an ontology with hypernymy links, thus we refer more specifically to a taxonomy of lexical units (nouns organized by hypernymy relations) rather than an ontology of formally defined concepts. A set of combined statistical procedures produce fragments of taxonomies from corpora that are later integrated into a unified taxonomy by a central algorithm. Our results show that with an ensemble of different components it is possible to achieve an accuracy only slightly worse than human performance. Finally, as our methods are based on quantitative linguistics, the algorithm we propose is not language specific. The language used for the experiments is, however, Spanish.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The GENIA Corpus: an Annotated Research Abstract Corpus in Molecular Biology Domain

With the information overload in genome-related field, there is an infreest need for natural language processing technology to extract information from literature and various attempts of information extraction using NLP has been being made. We are developing the necessary resources including domain ontology and annotated corpus from research abstracts in MEDLINE database (GENIA corpus). We are ...

متن کامل

Dealing with Large Corpora for Ontology Population

Multilingual ontology population from texts, i.e. addition of new terms in an ontology, requires a suitable parallel or comparable corpus. In this paper, we aim to check whether the corpus selected for our project suits the ontology we want to populate. The corpus for ontology population should not only reflect a specific domain and have a sufficient volume of data, as discussed in (Delpech et ...

متن کامل

Ontoprima: a Prototype for Automating Ontology Population

Ontology Population supports the process of building ontologies in the complex task of instantiating ontology. Performing this process manually is both expensive and time consuming; this logically leads to attempts of fully or partially automating the process of acquisition and absorption of knowledge in general and the process of Ontology Population in particular. This paper presents OntoPRiMa...

متن کامل

Populating Categories using Constrained Matrix Factorization

Matrix factorization methods are a well-scalable means of discovering generalizable information in noisy training data with many examples and many features. We propose a method to populate a given ontology of categories and seed examples using matrix factorization with constraints, based on a large corpus of noun-phrase/context cooccurrence statistics. While our method performs reasonably well ...

متن کامل

Rule-based Named Entity Extraction For Ontology Population

Currently, Text analysis techniques such as named entity recognition rely mainly on ontologies which represent the semantics of an application domain. To build such an ontology from specialized texts, this article presents a tool which detects proper names, locations and dates from texts by using manually written linguistic rules. The most challenging task is to extract not only entities but al...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Ontology Population using Corpus Statistics

نویسندگان

چکیده

منابع مشابه

The GENIA Corpus: an Annotated Research Abstract Corpus in Molecular Biology Domain

Dealing with Large Corpora for Ontology Population

Ontoprima: a Prototype for Automating Ontology Population

Populating Categories using Constrained Matrix Factorization

Rule-based Named Entity Extraction For Ontology Population

عنوان ژورنال:

اشتراک گذاری